Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Vis Comput Graph ; 29(8): 3519-3534, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35353702

RESUMO

Synthesizing human motion with a global structure, such as a choreography, is a challenging task. Existing methods tend to concentrate on local smooth pose transitions and neglect the global context or the theme of the motion. In this work, we present a music-driven motion synthesis framework that generates long-term sequences of human motions which are synchronized with the input beats, and jointly form a global structure that respects a specific dance genre. In addition, our framework enables generation of diverse motions that are controlled by the content of the music, and not only by the beat. Our music-driven dance synthesis framework is a hierarchical system that consists of three levels: pose, motif, and choreography. The pose level consists of an LSTM component that generates temporally coherent sequences of poses. The motif level guides sets of consecutive poses to form a movement that belongs to a specific distribution using a novel motion perceptual-loss. And the choreography level selects the order of the performed movements and drives the system to follow the global structure of a dance genre. Our results demonstrate the effectiveness of our music-driven framework to generate natural and consistent movements on various dance types, having control over the content of the synthesized motions, and respecting the overall structure of the dance.


Assuntos
Dança , Música , Humanos , Percepção Auditiva , Gráficos por Computador , Movimento
2.
IEEE Trans Image Process ; 31: 2040-2052, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35167452

RESUMO

Image matting is widely studied for accurate foreground extraction. Most algorithms, including deep-learning based solutions, require a carefully edited trimap. Recent works attempt to combine the segmentation stage and matting stage in one CNN model, but errors occurring at the segmentation stage lead to unsatisfactory matte. We propose a user-guided approach for practical human matting. More precisely, we provide a good automatic initial matting and a natural way of interaction that reduces the workload of drawing trimaps and allows users to guide the matting in ambiguous situation. We also combine the segmentation and matting stage in an end-to-end CNN architecture and introduce a residual-learning module to support convenient stroke-based interaction. The proposed model learns to propagate the input trimap and modify the deep image features, which can efficiently correct the segmentation errors. Our model supports arbitrary forms of trimaps from carefully edited to totally unknown maps. Our model also allows users to choose from different foreground estimations according to their preference. We collected a large human matting dataset consisting of 12K real-world human images with complex background and human-object relations. The proposed model is trained on the new dataset with a novel trimap generation strategy that enables the model to tackle different test situations and highly improves the interaction efficiency. Our method outperforms other state-of-the-art automatic methods and achieve competitive accuracy when high-quality trimaps are provided. Experiments indicate that our interactive matting strategy is superior to separately estimating the trimap and alpha matte using two models.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Humanos , Processamento de Imagem Assistida por Computador/métodos
3.
IEEE Trans Vis Comput Graph ; 27(7): 3305-3317, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32011257

RESUMO

We present prominent structures in video, a representation of visually strong, spatially sparse and temporally stable structural units, for use in video analysis and editing. With a novel quality measurement of prominent structures in video, we develop a general framework for prominent structure computation, and an efficient hierarchical structure alignment algorithm between a pair of videos. The prominent structural unit map is proposed to encode both binary prominence guidance and numerical strength and geometry details for each video frame. Even though the detailed appearance of videos could be visually different, the proposed alignment algorithm can find matched prominent structure sub-volumes. Prominent structures in video support a wide range of video analysis and editing applications including graphic match-cut between successive videos, instant cut editing, finding transition portals from a video collection, structure-aware video re-ranking, visualizing human action differences, etc.

4.
Artigo em Inglês | MEDLINE | ID: mdl-31613764

RESUMO

General image completion and extrapolation methods often fail on portrait images where parts of the human body need to be recovered -a task that requires accurate human body structure and appearance synthesis. We present a twostage deep learning framework for tackling this problem. In the first stage, given a portrait image with an incomplete human body, we extract a complete, coherent human body structure through a human parsing network, which focuses on structure recovery inside the unknown region with the help of full-body pose estimation. In the second stage, we use an image completion network to fill the unknown region, guided by the structure map recovered in the first stage. For realistic synthesis the completion network is trained with both perceptual loss and conditional adversarial loss.We further propose a face refinement network to improve the fidelity of the synthesized face region. We evaluate our method on publicly-available portrait image datasets, and show that it outperforms other state-of-the-art general image completion methods. Our method enables new portrait image editing applications such as occlusion removal and portrait extrapolation. We further show that the proposed general learning framework can be applied to other types of images, e.g. animal images.

5.
Artigo em Inglês | MEDLINE | ID: mdl-30507533

RESUMO

Video stabilization techniques are essential for most hand-held captured videos due to high-frequency shakes. Several 2D, 2.5D and 3D-based stabilization techniques have been presented previously, but to our knowledge, no solutions based on deep neural networks had been proposed to date. The main reason for this omission is shortage in training data as well as the challenge of modeling the problem using neural networks. In this paper, we present a video stabilization technique using a convolutional neural network. Previous works usually propose an offline algorithm that smoothes a holistic camera path based on feature matching. Instead, we focus on low-latency, real-time camera path smoothing, that does not explicitly represent the camera path, and does not use future frames. Our neural network model, called StabNet, learns a set of mesh-grid transformations progressively for each input frame from the previous set of stabalized camera frames, and creates stable corresponding latent camera paths implicitly. To train the network, we collect a dataset of synchronized steady and unsteady video pairs via a specially designed hand-held hardware. Experimental results show that our proposed online method performs comparatively to traditional offline video stabilization methods without using future frames, while running about 10× faster. More importantly, our proposed StabNet is able to handle low-quality videos such as night-scene videos, watermarked videos, blurry videos and noisy videos, where existing methods fail in feature extraction or matching.

6.
IEEE Trans Image Process ; 27(12): 5854-5865, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30047880

RESUMO

Selfie photography from the hand-held camera is becoming a popular media type. Although being convenient and flexible, it suffers from low camera motion stability, small field of view, and limited background content. These limitations can annoy users, especially, when touring a place of interest and taking selfie videos. In this paper, we present a novel method to create what we call a BiggerSelfie that deals with these shortcomings. Using a video of the environment that has partial content overlap with the selfie video, we stitch plausible frames selected from the environment video to the original selfie frames and stabilize the composed video content with a portrait-preserving constraint. Using the proposed method, one can easily obtain a stable selfie video with expanded background content by merely capturing some background shots. We show various results and several evaluations to demonstrate the applicability of our method.

7.
IEEE Trans Image Process ; 27(4): 1735-1747, 2018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-28880175

RESUMO

Hyper-lapse video with high speed-up rate is an efficient way to overview long videos, such as a human activity in first-person view. Existing hyper-lapse video creation methods produce a fast-forward video effect using only one video source. In this paper, we present a novel hyper-lapse video creation approach based on multiple spatially-overlapping videos. We assume the videos share a common view or location, and find transition points where jumps from one video to another may occur. We represent the collection of videos using a hyper-lapse transition graph; the edges between nodes represent possible hyper-lapse frame transitions. To create a hyper-lapse video, a shortest path search is performed on this digraph to optimize frame sampling and assembly simultaneously. Finally, we render the hyper-lapse results using video stabilization and appearance smoothing techniques on the selected frames. Our technique can synthesize novel virtual hyper-lapse routes, which may not exist originally. We show various application results on both indoor and outdoor video collections with static scenes, moving objects, and crowds.

8.
IEEE Trans Vis Comput Graph ; 20(11): 1507-18, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26355330

RESUMO

Our homes and workspaces are filled with collections of dozens of artifacts laid out on surfaces such as shelves, counters, and mantles. The content and layout of these arrangements reflect both context, e.g., kitchen or living room, and style, e.g., neat or messy. Manually assembling such arrangements in virtual scenes is highly time consuming, especially when one needs to generate multiple diverse arrangements for numerous support surfaces and living spaces. We present a data-driven method especially designed for artifact arrangement which automatically populates empty surfaces with diverse believable arrangements of artifacts in a given style. The input to our method is an annotated photograph or a 3D model of an exemplar arrangement, that reflects the desired context and style. Our method leverages this exemplar to generate diverse arrangements reflecting the exemplar style for arbitrary furniture setups and layout dimensions. To simultaneously achieve scalability, diversity and style preservation, we define a valid solution space of arrangements that reflect the input style. We obtain solutions within this space using barrier functions and stochastic optimization.

9.
IEEE Trans Image Process ; 22(7): 2532-44, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23481851

RESUMO

For images, gradient domain composition methods like Poisson blending offer practical solutions for uncertain object boundaries and differences in illumination conditions. However, adapting Poisson image blending to video presents new challenges due to the added temporal dimension. In video, the human eye is sensitive to small changes in blending boundaries across frames and slight differences in motions of the source patch and target video. We present a novel video blending approach that tackles these problems by merging the gradient of source and target videos and optimizing a consistent blending boundary based on a user-provided blending trimap for the source video. Our approach extends mean-value coordinates interpolation to support hybrid blending with a dynamic boundary while maintaining interactive performance. We also provide a user interface and source object positioning method that can efficiently deal with complex video sequences beyond the capabilities of alpha blending.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Movimento (Física) , Gravação em Vídeo/métodos , Algoritmos , Animais , Humanos , Distribuição de Poisson
10.
IEEE Comput Graph Appl ; 33(4): 62-72, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24808060

RESUMO

A new system provides a virtual experience akin to trying on clothing. It clones the user's photographic image into a catalog of images of models wearing the desired garments. Simple offline training extracts the user's head. Segmentation accurately separates the face, hair, and background, employing both a three-kernel statistical model and graph cuts. The system adjusts the resulting image's skin color according to a statistical model and relights the head via spherical harmonics. Finally, using a parametric model, the system warps the clone's body dimensions to fit the user's dimensions. This creates high-quality compositions of the user's image and the given garment.

11.
IEEE Trans Vis Comput Graph ; 19(5): 824-37, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-22732681

RESUMO

We present PoseShop--a pipeline to construct segmented human image database with minimal manual intervention. By downloading, analyzing, and filtering massive amounts of human images from the Internet, we achieve a database which contains 400 thousands human figures that are segmented out of their background. The human figures are organized based on action semantic, clothes attributes, and indexed by the shape of their poses. They can be queried using either silhouette sketch or a skeleton to find a given pose. We demonstrate applications for this database for multiframe personalized content synthesis in the form of comic-strips, where the main character is the user or his/her friends. We address the two challenges of such synthesis, namely personalization and consistency over a set of frames, by introducing head swapping and clothes swapping techniques. We also demonstrate an action correlation analysis application to show the usefulness of the database for vision application.


Assuntos
Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Postura/fisiologia , Imagem Corporal Total/métodos , Algoritmos , Humanos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Interface Usuário-Computador
12.
IEEE Trans Vis Comput Graph ; 13(2): 261-71, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17218743

RESUMO

A 3D shape signature is a compact representation for some essence of a shape. Shape signatures are commonly utilized as a fast indexing mechanism for shape retrieval. Effective shape signatures capture some global geometric properties which are scale, translation, and rotation invariant. In this paper, we introduce an effective shape signature which is also pose-oblivious. This means that the signature is also insensitive to transformations which change the pose of a 3D shape such as skeletal articulations. Although some topology-based matching methods can be considered pose-oblivious as well, our new signature retains the simplicity and speed of signature indexing. Moreover, contrary to topology-based methods, the new signature is also insensitive to the topology change of the shape, allowing us to match similar shapes with different genus. Our shape signature is a 2D histogram which is a combination of the distribution of two scalar functions defined on the boundary surface of the 3D shape. The first is a definition of a novel function called the local-diameter function. This function measures the diameter of the 3D shape in the neighborhood of each vertex. The histogram of this function is an informative measure of the shape which is insensitive to pose changes. The second is the centricity function that measures the average geodesic distance from one vertex to all other vertices on the mesh. We evaluate and compare a number of methods for measuring the similarity between two signatures, and demonstrate the effectiveness of our pose-oblivious shape signature within a 3D search engine application for different databases containing hundreds of models.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...